Probabilistic Data Programming with ENFrame
نویسندگان
چکیده
This paper overviews ENFrame, a programming framework for probabilistic data. In addition to relational query processing supported via an existing probabilistic database management system, ENFrame allows programming with loops, assignments, conditionals, list comprehension, and aggregates to encode complex tasks such as clustering and classification of probabilistic data. We explain the design choices behind ENFrame, some distilled from the wealth of work on probabilistic databases and some new. We also highlight a few challenges lying ahead. 1 Motivation and Scope Probabilistic data management has gone a long, fruitful way in the last decade [20]: We have a good understanding of the space of possible relational and hierarchical data models and its implication on query tractability; the community already delivered several open-source systems that exploit the first-order structure of database queries for scalable inference, e.g., MystiQ [3], Trio [22], MayBMS/SPROUT [11], and PrDB [19] to name very few, and applications in the space of web data management [8, 6]. Significantly less effort has been spent on supporting complex data processing beyond mere querying, such as general-purpose programming. There is a growing need for computing frameworks that allow users to build applications feeding on uncertain data without worrying about the underlying uncertain nature of such data or the computationally hard inference task that comes along with it. For tasks that only need to query probabilistic data, existing probabilistic database systems do offer a viable solution [20]. For more complex tasks, however, successful development requires a high level of expertise in probabilistic databases and this hinders the adoption of existing technology as well as communication between potential users and experts. A similar observation has been recently made in the areas of machine learning [5] and programming languages [10]. Developing programming languages that allow probabilistic models to be expressed concisely has become a hot research topic [18]. Such programming languages can be imperative (C-style) or declarative (first-order logic), with the novelty that they allow to express probability distributions via generative stochastic models, to draw values at random from such distributions, and to condition values of program variables on observations. For inference, the programs are usually grounded to Bayesian networks and fed to MCMC methods [15]. In the area of databases, MCDB [13] and SimSQL [4] have been visionary in enabling stochastic analytics in the database by coupling Monte Carlo simulations with declarative SQL extensions and parallel database techniques. The thesis of this work is that one can build powerful and useful probabilistic data programming frameworks that leverage existing work on probabilistic databases. ENFrame [21] is a framework that aims to fit this vision: Copyright 0000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering
منابع مشابه
Highlights on published work ( with a bit of vision )
This paper overviews ENFrame, a framework for processing probabilistic data. In addition to relational query processing supported by existing probabilistic database management systems, ENFrame allows programming with loops, assignments, list comprehension, and aggregates to encode complex tasks such as clustering and classification of data retrieved via queries from probabilistic databases. We ...
متن کاملENFrame: A Platform for Processing Probabilistic Data
This paper introduces ENFrame, a unified data processing platform for querying and mining probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as bounded-range loops, list comprehension, aggregate operations on lists, and calls to external database engines. The program is then interpreted probabilistically by ENFrame. The realisation of ENFram...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملMulti-item inventory model with probabilistic demand function under permissible delay in payment and fuzzy-stochastic budget constraint: A signomial geometric programming method
This study proposes a new multi-item inventory model with hybrid cost parameters under a fuzzy-stochastic constraint and permissible delay in payment. The price and marketing expenditure dependent stochastic demand and the demand dependent the unit production cost are considered. Shortages are allowed and partially backordered. The main objective of this paper is to determine selling price, mar...
متن کاملUsing Probabilistic-Risky Programming Models in Identifying Optimized Pattern of Cultivation under Risk Conditions (Case Study: Shoshtar Region)
Using Telser and Kataoka models of probabilistic-risky mathematical programming, the present research is to determine the optimized pattern of cultivating the agricultural products of Shoshtar region under risky conditions. In order to consider the risk in the mentioned models, time period of agricultural years 1996-1997 till 2004-2005 was taken into account. Results from Telser and Kataoka mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 37 شماره
صفحات -
تاریخ انتشار 2014